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Abstract 

The idea of a finite collection of closed sets having "strongly regu- 
lar intersection" at a given point is crucial in variational analysis. We 
show that this central theoretical tool also has striking algorithmic 
consequences. Specifically, we consider the case of two sets, one of 
which we assume to be suitably "regular" (special cases being convex 
sets, smooth manifolds, or feasible regions satisfying the Mangasarian- 
Fromovitz constraint qualification). We then prove that von Neu- 
mann's method of "alternating projections" converges locally to a 
point in the intersection, at a linear rate associated with a modulus 
of regularity. As a consequence, in the case of several arbitrary closed 
sets having strongly regular intersection at some point, the method 
of "averaged projections" converges locally at a linear rate to a point 
in the intersection. Inexact versions of both algorithms also converge 
linearly. 
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1 Introduction 



An important theme in computational mathematics is the relationship be- 
tween the "conditioning" of a problem instance and the speed of convergence 
of iterative solution algorithms on that instance. A classical example is the 
method of conjugate gradients for solving a positive definite system of linear 
equations: we can bound the linear convergence rate in terms of the relative 
condition number of the associated matrix. More generally, Renegar [32-34] 
showed that the rate of convergence of interior-point methods for conic con- 
vex programming can be bounded in terms of the "distance to ill-posedness" 
of the program. 

In studying the convergence of iterative algorithms for nonconvex min- 
imization problems or nonmonotone variational inequalities, we must con- 
tent ourselves with a local theory. A suitable analogue of the distance to 
ill-posedness is then the notion of "metric regularity" , fundamental in vari- 
ational analysis. Loosely speaking, a generalized equation, such as a system 
of inequalities, for example, is metrically regular when, locally, we can bound 
the distance from a trial solution to an exact solution by a constant multiple 
of the error in the equation generated by the trial solution. The constant 
needed is called the "regularity modulus", and its reciprocal has a natural 
interpretation as a distance to ill-posedness for the equation [15]. 

This philosophy suggests understanding the speed of convergence of algo- 
rithms for solving generalized equations in terms of the regularity modulus 
at a solution. Recent literature focuses in particular on the proximal point 
algorithm (see for example [1,22,29]). A unified approach to the relationship 
between metric regularity and the linear convergence of a family of concep- 
tual algorithms appears in [23]. 

We here study a very basic algorithm for a very basic problem. We 
consider the problem of finding a point in the intersection of several closed 
sets, using the method of averaged projections: at each step, we project the 
current iterate onto each set, and average the results to obtain the next 
iterate. Global convergence of this method in the case of two closed convex 
sets was proved in 1969 in [2] . In this work we show, in complete generality, 
that this method converges locally to a point in the intersection of the sets, 
at a linear rate governed by an associated regularity modulus. Our linear 
convergence proof is elementary: although we use the idea of the normal 
cone, we apply only the definition, and we discuss metric regularity only to 
illuminate the rate of convergence. 
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Our approach to the convergence of the method of averaged projections 
is standard [4,30]: we identify the method with von Neumann's alternating 
projection algorithm [40] on two closed sets (one of which is a linear subspace) 
in a suitable product space. A nice development of the classical method of 
alternating projections may be found in [11]. The linear convergence of the 
method for two closed convex sets with regular intersection was proved in [4], 
strengthening a classical result of [21]. Remarkably, we show that, assuming 
strong regularity, local linear convergence requires good geometric properties 
(such as convexity, smoothness, or more generally, "amenability " or "prox- 
regularity" ) of only one of the two sets. 

One consequence of our convergence proof is an algorithmic demonstra- 
tion of the "exact extremal principle" described in [26, Theorem 2.8]. This 
result, a unifying theme in [26], asserts that if several sets have strongly reg- 
ular intersection at a point, then that point is not "locally extremal" [26]: 
in other words, translating the sets by small vectors cannot render the in- 
tersection empty locally. To prove this result, we simply apply the method 
of averaged projections, starting from the point of regular intersection. In 
a further section, we show that inexact versions of the method of averaged 
projections, closer to practical implementations, also converge linearly. 

The method of averaged projections is a conceptual algorithm that might 
appear hard to implement on concrete nonconvex problems. However, the 
projection problem for some nonconvex sets is relatively easy. A good exam- 
ple is the set of matrices of some fixed rank: given a singular value decom- 
position of a matrix, projecting it onto this set is immediate. Furthermore, 
nonconvex iterated projection algorithms and analogous heuristics are quite 
popular in practice, in areas such as inverse eigenvalue problems [7,8], pole 
placement [27,42], information theory [39], low-order control design [19,20,28] 
and image processing [5,41]). Previous convergence results on nonconvex 
alternating projection algorithms have been uncommon, and have either fo- 
cussed on a very special case (see for example [7,25]), or have been much 
weaker than for the convex case [10,39]. For more discussion, see [25]. 

Our results primarily concern R-linear convergence: in other words, we 
show that our sequences of iterates converge, with error bounded by a ge- 
ometric sequence. In a final section, we employ a completely different ap- 
proach to show that the method of averaged projections, for prox-regular sets 
with regular intersection, has a Q-linear convergence property: each iteration 
guarantees a fixed rate of improvement. In a final section, we illustrate these 
theoretical results with an elementary numerical example coming from signal 
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processing. 



2 Notation and definitions 

We begin by fixing some notation and definitions. Our underlying setting 
throughout this work is a Euclidean space E with corresponding closed unit 
ball B. For any point x G E and radius p > , we write B p (x) for the set 
x + pB. 

Consider first two sets F,GcR. A point x G FflG is locally extremal [26] 
for this pair of sets if restricting to a neighborhood of x and then translating 
the sets by small distances can render their intersection empty: in other 
words, there exists a p > and a sequence of vectors z r — > in E such that 

(F + zr) D G n B p (x) = for all r = 1, 2, ... . 
Clearly x is not locally extremal if and only if 

G int (((F - x) n pB) - ((G - x) D pF)) for all p > 0. 

For recognition purposes, it is easier to study a weaker property than local 
extremality. Following the terminology of [24], we say the two sets F,GcE 
have strongly regular intersection at the point x G F n G if there exists a 
constant a > such that 

«pB C ((F - x) n pB) - ((G - z) n pB) 

for all points x G F near x and z G G near x. By considering the case 
x = z = x, we see that strong regularity implies that x is not locally extremal. 
This "primal" definition of strong regularity is often not the most convenient 
way to handle strong regularity, either conceptually or theoretically. By 
contrast, a "dual" approach, using normal cones, is very helpful. 

Given a set F C E, we define the distance function and (multivalued) 
projection for F by 

d F (x) = d(x,F) = inf{||z - x\\ : z G F} 
Pf(x) = argmin{||,2 — x\\ : z G F}. 

The central tool in variational analysis is the normal cone to a closed set 
F C E at a point x G F, which can be defined (see [9,26,35]) as 

N F (x) = \ \imti(xi - Zi) :ti>0, Xi — >• x, ^ G Ff(^) [• 
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Notice two properties in particular. First, 

(2.1) zeP F (x) =>• x-zeN F (z). 

Secondly, the normal cone is a "closed" multifunction: for any sequence of 
points x r — * x in F, any limit of a sequence of normals y r G N F (x r ) must lie 
in N F (x). Indeed, the definition of the normal cone is in some sense driven 
by these two properties: it is the smallest cone satisfying the two properties. 
Notice also that we have the equivalence: N F (x) = {0} -<=>- x G int F. 

Normal cones provide an elegant alternative approach to defining strong 
regularity. In general, a family of closed sets Fi, F 2 , . . . F m C E has strongly 
regular intersection at a point x G fljFj, if the only solution to the system 

Vi G N F .(x) (i = 1,2,..., m) 

m 

i=i 

is yi — for i — 1, 2, . . . , m. In the case m — 2, this condition can be written 

N Fl (x)n-N F2 (x) ={0}, 

and it is equivalent to our previous definition (see [24, Cor 2], for example). 
We also note that this condition appears throughout variational-analytic the- 
ory. For example, it guarantees the important inclusion (see [35, Theorem 
6.42]) 

N Fin ...nF m (x) C N Fl (x) + --- + N F Jx). 

We will find it helpful to quantify the notion of strong regularity (cf. [24]). 
A straightforward compactness argument shows the following result. 

Proposition 2.2 (quantifying strong regularity) A collection of closed 
sets Fi, i*2, . . . , F m C E have strongly regular intersection at a point x G HFj 
if and only if there exists a constant k > such that the following condition 
holds: 



(2.3) yi eN Fi (x) (i = l,2,...,m) M 2 < HI 5> 

V i i 
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We define the condition modulus cond(i r i, F 2 , . . . , F m \x) to be the infimum 
of all constants k > such that property (12. 3p holds. Using the triangle 
and Cauchy-Schwarz inequalities, we notice that vectors yi,y2, ■ ■ ■ ,y m G E 
always satisfy the inequality 

(2-4) £W^-||Ew|f> 

z — ' m II z — ' II 

i i 

which yields 

(2.5) cond(F 1 ,F 2 ,...,F m \x)>-=, 

except in the special case when Np^x) = {0} (or equivalently x G int-Fj) for 
alH = 1,2, ... ,m; in this case the condition modulus is zero. 

One goal of this paper is to show that, far from being of purely an- 
alytic significance, strong regularity has central algorithmic consequences, 
specifically for the method of averaged projections for finding a point in the 
intersection fljFj. Given any initial point xq G E, the algorithm proceeds 
iteratively as follows: 

4 e PfAxu) (i = 1,2,..., m) 
m 

Our main result shows, assuming only strong regularity, that providing the 
initial point x is near x, any sequence X\, x 2 , x 3 , . . . generated by the method 
of averaged projections converges linearly to a point in the intersection Dji^, 
at a rate governed by the condition modulus. 



3 Strong and metric regularity 

The notion of strong regularity is well-known to be closely related to another 
central idea in variational analysis: "metric regularity" . A concise summary 
of the relationships between a variety of regular intersection properties and 
metric regularity appears in [24]. We summarize the relevant ideas here. 

Consider a set- valued mapping $ : E =$ Y, where Y is a second Euclidean 
space. The inverse mapping : Y =4 E is defined by 

x G $ _1 (y) y G $(x), for x G E, y G Y. 
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For vectors x G E and y G 3>(x), we say $ is metrically regular at x for y 
if there exists a constant k > such that all vectors x G E close to x and 
vectors j/G Y close to y satisfy 

^^(y)) <«d(y,$(x)). 

Intuitively, this inequality gives a local linear bound for the distance to a 
solution of the generalized equation y G (where the vector y is given 

and we seek the unknown vector x), in terms of the the distance from y 
to the set <fr(x). The infimum of all such constants k is called the modulus 
of metric regularity of $ at a; for y, denoted reg$(x|y). This modulus is 
a measure of the sensitivity or "conditioning" of the generalized equation 
y G <J>(x). To take one simple example, if $ is a single- valued linear map, 
the modulus of regularity is the reciprocal of its smallest singular value. In 
general, variational analysis provides a powerful calculus for computing the 
regularity modulus. In particular, we have the following formula [35, Thm 
9.43]: 

(3.1) — = min \d(0, D*$(x\y)(w)) : w eY, \\w\\ = l), 

reg®(x\y) I ) 

where D* denotes the "co derivative" . 

We now study these ideas for a particular mapping, highlighting the con- 
nections between metric and strong regularity. As in the previous section, 
consider closed sets iq, F 2 , . . . , F m C E and a point x G PljFj. We endow the 
space E m with the inner product 

(x 1 ,x 2 ,...,x m ),(y 1 ,y 2 ,...,y m )^ = J^(a:»,J/i), 

i 

and define set-valued mapping $ : E =4 E m by 

$(x) = (F 1 -x)x(F 2 -x)x---x (F m - x). 
Then the inverse mapping is given by 

*- 1 (y)=C\(Fi-yi), foryGE™ 

i 

and finding a point in the intersection DjFj is equivalent to finding a solu- 
tion of the generalized equation G <&(x). By definition, the mapping $ is 
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metrically regular at x for if and only if there is a constant k > such that 
the following strong metric inequality holds: 



(3.2) d(x, {*\{Fi — Zi)\ < k d 2 (x, Fj — Zj) for all (x, z) near (x, 0). 

i y t 

Furthermore, the regularity modulus reg$(x|0) is just the infimum of those 
constants k > such that inequality (I3.2p holds. 

To compute the coderivative D*$(x\0), we decompose the mapping $ as 
^ — A, where, for points x G E, 

tt(a;) = Fx x F 2 x • • • x F m 

/\ ry> I nr> rp r f I 

Ji-tAj I ti.' . %Xj • • • • j iXl ) • 

The calculus rule [35, 10.43] yields £>*$(x|0) = D*^(x\Ax) - A*. Then, by 
definition, 

v e D*^(x\Ax)(w) ^ (v, -w) G N gph y(x, Ax), 

and since gph $ = ExF 1 xF 2 x---x F m , we deduce 

{0} if Wi G -N Fi (x) V? 
otherwise 



otherwise. 



D*#(x|Ax)(™) 

and hence 

L>*$(x|0)H = 
From the coderivative formula (13.11) we now obtain 

(3 ' 3) regJ(x|0) = min {|| : EWI 2 = 1 - VteN Ft (x)}, 

where, following the usual convention, we interpret the right-hand side as 
+oo if N F .(x) = {0} (or equivalently x G int Fj) for all i = 1, 2, . . . , m. Thus 
the regularity modulus agrees exactly with the condition modulus that we 
defined in the previous section: 

reg$(x|0) = cond(Fi,F 2 , ...,F m \x). 

Furthermore, as is well-known [24], strong regularity is equivalent to the 
strong metric inequality ( 13. 2ft . 
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4 Clarke regularity and refinements 



Even more central than strong regularity in variational analysis is the concept 
of "Clarke regularity". In this section we study a slight refinement, crucial 
for our development. In the interest of maintaining as elementary approach 
as possible, we use the following geometric definition of Clarke regularity. 

Definition 4.1 (Clarke regularity) A closed set C C R n is Clarke regular 
at a point x&Cif, given any 5 > 0, any two points u, z near x with z G C, 
and any point y G Pc(u), satisfy 

(z — x,u — y) < 5\\z — x\\ ■ \\u — y\\. 

In other words, the angle between the vectors z — x and u — y, whenever it 
is defined, cannot be much less than | when the points u and z are near x. 

Remark 4.2 This property is equivalent to the standard notion of Clarke 
regularity. To see this, suppose the property in the definition holds. Consider 
any unit vector v G N c (x), and any unit "tangent direction" w to C at x. 
By definition, there exists a sequences u r — > x, y r G Pc(u r ), and z r — > x with 
z r G C, such that 



I u r y r | 



w. 



By assumption, given any e > 0, for all large r the angle between the two 
vectors on the left-hand side is at least f — e, and hence so is the angle 
between v and w. Thus (v,w) < 0, so Clarke regularity follows, by [35, Cor 
6.29]. Conversely, if the property described in the definition fails, then for 
some e > and some sequences u T — > x, y r G Pc(u r ), and z r — > x with 
z r G C, the angle between the unit vectors 

(4.3) u Ur ~ Vr u and ' 



| | & I \\ || ' I I 

is less than | — e. Then any cluster points t> and w of the two sequences (14.31) 
are respectively an element of Nq{x) and a tangent direction to C at x, and 
satisfy (v, w) > 0, contradicting Clarke regularity. 
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The property we need for our development is an apparently-slight modi- 
fication of Clarke regularity. 



Definition 4.4 (super-regularity) A closed set C C R n is super-regular 
at a point x G C if, given any 5 > 0, any two points u, z near x with z G C, 
and any point y G Pc(u), satisfy 

(z-y,u-y) < 5\\z - y\\ ■ \\u- y\\. 

In other words, then angle between the vectors z — y and u — y, whenever it 
is defined, cannot be much less than | when the points u and z are near x. 
An equivalent statement involves the normal cone. 

Proposition 4.5 (super-regularity and normal angles) A closed set 
C C R n is super-regular at a point x G C if and only if, for all 5 > 0, the 
inequality 

(v, z — y) < S\\v\\ • \\z — y\\ 
holds for all points y,z G C near x and all normal vectors v G Nc{y). 

Proof Super-regularity follows immediately from the normal cone property 
describe in the proposition, by property (12.11) . Conversely, suppose the nor- 
mal cone property fails, so for some 5 > and sequences of distinct points 
y r ,z r G C approaching x and unit normal vectors v r G N c (y r ), we have, for 



allr = 1,2,..., 



z r y r . 

Vr, ~ ) > 0. 



| %r yr 

Fix an index r. By definition of the normal cone, there exist sequences 
of distinct points u 3 r — > y r and yj. G Pc(u J r ) such that 

v v? r y^ 

hm — : — = v r . 

j^QO _ yl || 

Since limj y 3 r = y r , we must have, for all large j, 

v? r -yi z r -y j r \ 

> o. 



I j J II II J l 

I Vr yr || W^r Ur \ 



Choose j sufficiently large to ensure both the above inequality and the in- 
equality — y r \\ < j an d then define points u' = v? T and y' = y{. 
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We now have sequences of points u' r , z r approaching x with z r G C, and 
y' r G Pc(K)i an< i satisfying 



> 5. 



u r -y r z r - y. 



I Vr 1 1 W^r Ur\ 

Hence C is not super-regular at x. □ 



Super-regularity is a strictly stronger property than Clarke regularity, as 
the following result and example make clear. 

Corollary 4.6 (super-regularity implies Clarke regularity) 

// a closed set C C R n is super-regular at a point, then it is also Clarke 
regular there. 

Proof Suppose the point in question is x. Fix any 5 > 0, and set y = x 
in Proposition 14.51 Then clearly any unit tangent direction d to C at x and 
any unit normal vector v G Nc{x) satisfy (v,d) < 5. Since 5 was arbitrary, 
in fact (v,d) < 0, so Clarke regularity follows by [35, Cor 6.29]. □ 



Example 4.7 Consider the following function /: R — >• (— oo, +oo], taken 
from an example in [37]: 

( 2 r {t-2 r ) (2 r < t < 2 r+1 , r G Z) 
f(t) = { (t = 0) 

( +oo (t < 0). 

The epigraph of this function is Clarke regular at (0,0), but it is not hard 
to see that it is not super-regular there. Indeed, a minor refinement of this 
example (smoothing the set slightly close to the nonsmooth points (2 r , 0) 
and (2 r ,4 r_1 )) shows that a set can be everywhere Clarke regular, and yet 
not super-regular. 

Super-regularity is a common property: indeed, it is implied by two well- 
known properties, that we discuss next. Following [35], we say that a set 
C C R n is amenable at a point x G C when there exists a neighborhood U 
of x, a C 1 mapping G : U — > H e , and a closed convex set DcR' containing 
G(x), and satisfying the constraint qualification 

(4.8) N D (G(x)) D ker(VGOr)*) = {0}, 
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such that points x G R" near x lie in C exactly when G(x) G D. In particular, 
if C is defined by C 1 equality and inequality constraints and the Mangasarian- 
Fromovitz constraint qualification holds at x, then C is amenable at x. 

Proposition 4.9 (amenable implies super-regular) If a closed set C C 
R n is amenable at a point in C , then it is super-regular there. 

Proof Suppose the result fails at some point x G C. Assume as in the 
definition of amenability that, in a neighborhood of x, the set C is identical 
with the inverse image G~ 1 (D), where the C l map G and the closed convex 
set D satisfy the condition (14.81) . Then by definition, for some 5 > 0, there 
are sequences of points y r , z r G C and unit normal vectors v r G N c (y r ) 
satisfying 

(v r , z r — y r ) > 5\\z r — y r \\, for all r = 1, 2, . . .. 
It is easy to check the condition 

N D (G(y r ))nker(\7G(y r y) = {0}, 

for all large r, since otherwise we contradict assumption (14.81) . Consequently, 
using the standard chain rule from [35], we deduce 

N c (y r ) = VG(y r )*N D (G(y r )), 

so there are normal vectors u T G NE>(G(y r )) such that VG(y r )*u r = v r . The 
sequence (u r ) must be bounded, since otherwise, by taking a subsequence, 
we could suppose ||u r || — > oo and ||M r .||" 1 M r approaches some unit vector u, 
leading to the contradiction 

ii G N D (G(x)) n ker(VG(s)*) = {0}. 

For all large r, we now have 

(VG(y r )*u r , z r - y r ) > 8\\z r - y r \\, 

and by convexity we know 

{u r , G{z r ) - G{y r )) < 0. 

Adding these two inequalities gives 

(u r ,G(z r ) - G(y r ) - VG{y r ){z r - y r )) < -8\\z r - y r \\. 
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But as r — > oo, the left-hand side is o(\\z r — y r \\), since the sequence (u r ) is 
bounded and G is C 1 . This contradiction completes the proof. □ 

A rather different refinement of Clarke regularity is the notion of "prox- 
regularity". Following [31, Thm 1.3], we call a set C C E is prox-regular 
at a point x G C if the projection mapping is single-valued around x. 
(In this case, clearly C must be locally closed around x.) For example, if, 
in the definition of an amenable set that we gave earlier, we strengthen our 
assumption on the map G to be C 2 rather than just C 1 , the resulting set 
must be prox-regular. On the other hand, the set 

{( S ,t)GR 2 :t = | S | 3/2 } 

is amenable at the point (0,0) (and hence super-regular there), but is not 
prox-regular there. 

Proposition 4.10 (prox-regular implies super-regular) If a closed set 
C C R n is prox-regular at a point in C , then it is super-regular there. 

Proof If the results fails at x G C, then for some constant 5 > 0, there exist 
sequences of points y r , z T G C converging to the point x, and a sequence of 
normal vectors v r G N c (y r ) satisfying the inequality 

(v r ,z r — y r ) > 5\\v r \\ ■ \\z r -y r \\. 

By [31, Proposition 1.2], there exist constants e, p > such that 

/ e \ p 2 

\2\\v r \\ I 2 

for all large r. This gives a contradiction, since \\z r — y r \ < ^ eventually. □ 

Super-regularity is related to various other notions in the literature. We 
end this section with a brief digression to discuss these relationships. First 
note the following equivalent definition, which is an immediate consequence 
of Proposition 14.51 and which gives an alternate proof of Proposition 14. 101 via 
"hypomonotonicity" of the truncated normal cone mapping x 1— > Nc(x) D B 
for prox-regular sets C [31, Thm 1.3]. 
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Corollary 4.11 (approximate monotonicity) A closed set C C R n is 
super-regular at a point x G C if and only if, for all 5 > 0, the inequality 

(v-w,y-z) > -8\\y-z\\ 

holds for all points y, z G C near x and all normal vectors v G Nc(y) H B 
and w G N c {z) n B. 

If we replace the normal cone Nc in the property described in the result above 
by its convex hull, the "Clarke normal cone" , we obtain a stronger property, 
called "subsmoothness" in [3]. Similar proofs to those above show that, 
like super-regularity, subsmoothness is a consequence of either amenability 
or prox-regularity. However, submoothness is strictly stronger than super- 
regularity. To see this, consider the graph of the function / : R — > R defined 
by the following properties: /(0) = 0, f(2 r ) = 4 r for all integers r, / is linear 
on each interval [2 r , 2 r+1 ], and f(t) = f(—t) for all t G R. The graph of / is 
super-regular at (0,0), but is not subsmooth there. 

In a certain sense, however, the distinction between subsmoothness and 
super- regularity is slight. Suppose the set F is super- regular at every point 
in F n U, for some open set U C R™. Since super- regularity implies Clarke 
regularity, the normal cone and Clarke normal cone coincide throughout F H 
U, and hence F is also subsmooth throughout Ff]U. In other words, "local" 
super regularity coincides with "local" subsmoothness, which in turn, by [3, 
Thm 3.16] coincides with the "first order Shapiro property" [36] (also called 
"near convexity" in [38]) holding locally. 

5 Alternating projections with nonconvexity 

Having reviewed or developed over the last few sections the key variational- 
analytic properties that we need, we now turn to projection algorithms. In 
this section we develop our convergence analysis of the method of alternating 
projections. The following result is our basic tool, guaranteeing conditions 
under which the method of alternating projections converges linearly. For 
flexibility, we state it in a rather technical manner. For clarity, we point out 
afterward that the two main conditions, (15.21) and (I5.3p . are guaranteed in 
applications via assumptions of strong regularity and super-regularity (or in 
particular, amenability or prox-regularity) respectively. 
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Theorem 5.1 (linear convergence of alternating projections) 

Consider the closed sets F,C C E ; and a point x G F. Fix any constant 
e > 0. Suppose for some constant c' G (0, 1), the following condition holds: 

xeFn(x + eB), ue-N F (x)nB \ (uv)<c > 
yeCn(x + eB), v e N c (y) n B ] ^ \ u ' v )^ c - 

Suppose furthermore for some constant 5 G [0, the following condition 
holds: 

,- s y,z G Cn (x + eB) \ , . ... ,, 

(5.3) v G ^ )nB | =► <«,*-!/> <*||*-y||. 

Define a constant c = d '+25 < 1. Then for any initial point x G C satisfying 
ll^o — ^|| < an ?/ sequence of alternating projections on the sets F and 

C, 

x 2n+l G P F (x2n) and x 2n+2 G ^cO^n+i) (n = 0, 1, 2, . . .) 

must converge with R-linear rate y/c to a point x G F fl C satisfying the 
inequality \\x — xo\\ < j^\\x — x\\ . 

Proof First note, by the definition of the projections we have 

(5.4) ||a;2n+3 — ^2n+2 1| < ||^2n+2 _ X 2n +1 || < ||^2n+l — x 2n \\ ■ 

Clearly we therefore have 

(5.5) ||^2n+2 — x 2n\\ < 2||:T2n+l — X 2n \\. 

We next claim 

,r«N \\x 2 n+i ~ x\\ < f and \ .. .. .. .. 

|p2n+l — x 2n|| ^2 J 

To see this, note that if x 2n+2 = ^2n+i, the result is trivial, and if x 2n +i = x 2n 
then x 2 „+2 = x 2n+ i so again the result is trivial. Otherwise, we have 

X 2n — %2n+l — ]\T I \ r> D 

G N F (x 2n+1 ) (IB 



\\X 2n — %2n+l I 

while 



^2n+2 — %2n+l AT , \ r\ T3 

G -N c (x 2n+2 ) n B. 



||^2n+2 — X 2n+ \ || 
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Furthermore, using inequality (15.41) . the left-hand side of the implication (15.61) 

ensures 



||^2n+2 ^|| ||2 ; 2n+2 — ^2n+l || + ||^2n+l ~~ #|| 

< |p2n+l — x 2n\\ + |p2n+l — ^|| — e - 

Hence, by assumption (I5.2p we deduce 

%2n ~ %2n+l x 2n+2 ~ x 2n+l 



\X2n — X2n+l\\ ||^2n+2 — ^2n+l | 
SO 



< c, 



(%2n ~ %2n+li %2n+2 ~ ^2n+l) c'H^n ~~ 3 ; 2n+l|| ' 11^2^+2 — ^2n+l||- 

On the other hand, by assumption (15.31) we know 

(%2n ~ %2n+2, %2n-\-l ~ %2n+2) < ^|p2n ~~ ^2n+2 ]| ' 11^271+1 — ^2n+2|| 

< 25||x2n — ^2ra+l|| " ||^2n+2 _ ^2n+l||, 

using inequality (15.51) . Adding this inequality to the previous inequality then 
gives the right-hand side of (15.61) . as desired. 

Now let a = \\x — x\\. We will show by induction the inequalities 

1 _ c "+l e 

(5.7) ||^2n+i-^|| < 2a— — < - 

X C — 

(5.8) \\x 2n +i - x 2n \\ < ac n < | 

(5.9) \\x 2n+2 - x 2n+1 \\ < ac n+1 . 



Consider first the case n = 0. Since x\ G Pp(xo) and x G F, we deduce 
ll^i _ #o|| < \\% — %o\\ = a < e /2, which is inequality (15. 8p . Furthermore, 

ll^i ~ x\\ < ll^i — ^o|| + ll^o — %\\ < 2a < -, 

which shows inequality (15.71) . Finally, since ||;Ei — Xq\\ < e/2 and ||a;i — x\\ < 
e/2, the implication (15.61) shows 

II #2 — ^l|| < cllXi — Xq\\ < c\\x — SqII = ca, 
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which is inequality (15.91) . 

For the induction step, suppose inequalities (15.71) . (15.81) . and (15. 9p all hold 
for some n. Inequalities (15 .4p and (15.91) imply 

(5.10) ||x 2n+3 - x 2n+2 || < ac n+l < ~. 

We also have, using inequalities (I5.10p . (15. 9p . and (15. 7p 

11^271+3 — ^|| ^ || a; 2n+3 — ^2n+2 || + 11^271+2 — #2n+l|| + H^n+l ~~ #|| 

1 - c n+l 



< ac n+1 + ac n+1 + 2a 



1-c 
so 

(5.11) ||x 2n+3 -x|| < 2a— < -. 

1 — c 2 

Now implication (15.61) with n replaced by n + 1 implies 

||^2n+4 — 3^2n+3 || — c\\x2n+3 — ^2n+2 || , 

and using inequality (I5.10p we deduce 

(5.12) ||s 2 „ + 4 - x 2n+3 || < ac n+2 . 

Since inequalities (15.111) . (15.101) . and (15.121) are exactly inequalities (15. 7p . 
( 15.81) . and (15.91) with n replaced by n + 1, the induction step is complete and 
our claim follows. 

We can now easily check that the sequence (x^) is Cauchy and therefore 
converges. To see this, note for any integer n = 0, 1, 2, . . . and any integer 
k > 2n, we have 



k-i 

\xk-x 2 „\\ < - Xj\\ 

j=2n 

< a(c n + c n+1 + c n+1 + c n+2 + c n+2 + ---) 



so 

II 11^ n 1 + C 

\\Xk ~ X2n\\ < «C , 

1 — C 
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and a similar argument shows 



2ac n+1 

(5.13) \\Xk+l — ^2n+l < — j • 

1 — c 

Hence X}. converges to some point x G E, and for all n = 0, 1, 2, ... we have 

1 + c 2ac n+1 

(5.14) llx — X2n\\ < «c" and \\x — a^n+ill < . 

1 — c ~ ' 1 — c 

We deduce that the limit x lies in the intersection F D C and satisfies the 
inequality — Xoll < o:^ 2 , and furthermore that the convergence is R-linear 
with rate y/c, which completes the proof. □ 

To apply Theorem 15. II to alternating projections between a closed and a 
super- regular set, we make use of the key geometric property of super-regular 
sets (Proposition 14.51) : at any point near a point where a set is super- regular 
the angle between any normal vector and the direction to any nearby point 
in the set cannot be much less than ~. 

We can now prove our key result. 

Theorem 5.15 (alternating projections with a super-regular set) 

Consider closed sets F, C C E and a point x 6 F PI C . Suppose C is super- 
regular at x (as holds, for example, if it is amenable or prox-regular there). 
Suppose furthermore that F and C have strongly regular intersection at x: 
that is, the condition 

N F (x) H -Nc(x) = {0} 
holds, or equivalently, the constant 

(5.16) c= maxj(ti,t>) : u G N F (x) nfl, v e -N c (x) Hb\ 

is strictly less than one. Fix any constant c G (c, 1). Then, for any initial 
point xq G C close to x, any sequence of iterated projections 

x 2n +i G P F (x 2 n) and x 2n +2 e P c {x2n+i) (n = 0, 1, 2, . . .) 
must converge to a point in F fl C with R-linear rate \fc. 
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Proof Let us show first the equivalence between c < 1 and strong regularity. 
The compactness of the intersections between normal cones and the unit 
ball guarantees the existence of u and v achieving the maximum in (I5.16p . 
Observe then that 

(u, v) < ||f || < 1. 
The cases of equality in the Cauchy-Schwarz inequality permits to write 

c = 1 u and v are colinear N F {x) fl —N c {x) ^ {0}, 

which corresponds to the desired equivalence. 

Fix now any constant d G (c, c) and define 5 = £z ^. To apply Theorem 
15.11 we just need to check the existence of a constant e > such that con- 
ditions (I5.2p and (I5.3P hold. Condition (I5.3P holds for all small e > 0, by 
Proposition H3J On the other hand, if condition (15.21) fails for all small e > 0, 
then there exist sequences of points x r — > x in the set F and y r — > x in the set 
C, and sequences of vectors u r G —N F (x r ) fl B and v r G Nc(y r ) H B, satisfy- 
ing (u r ,v r ) > d . After taking a subsequences, we can suppose u r approaches 
some vector u G —N F (x) (IB and v r approaches some vector v G Nc{x) (IB, 
and then (u, v ) > d > c, contradicting the definition of the constant c. □ 

Corollary 5.17 (improved convergence rate) With the assumptions of 
Theorem \5.15l suppose the set F is also super-regular at x. Then the alter- 
nating projection sequence converges with R-linear rate c. 

Proof Inequality (I5.6p . and its analog when the roles of F and C are 
interchanged, together show 

\\x k+ i - x k \\ < c\\x k - X k -l\\ 

for all large k, and the result then follows easily, using an argument analogous 
to that at the end of the proof of Theorem 15.11 □ 

In the light of our discussion in the previous section, the strong regularity 
assumption of Theorem 15.151 is equivalent to the metric regularity at x for 
of the set-valued mapping \& : E =4 E 2 defined by 

^(x) = (F - x) x (C - x), for x G E. 
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Using equation (I3.3p . the regularity modulus is determined by 



1 



min < \\u + v \\ : u G Np{x), v G N c (x), \\u\\ 2 + \\v 




reg \l/(x|0) 



and a short calculation then shows 



(5.18) 



reg^(x\0) 



1 



The closer the constant c is to one, the larger the regularity modulus. We 
have shown that c also controls the speed of linear convergence for the method 
of alternating projections applied to the sets F and C. 

Inevitably, Theorem 15.151 concerns local convergence: it relies on finding 
an initial point Xq sufficiently close to a point of strongly regular intersection. 
How might we find such a point? 

One natural context in which to pose this question is that of sensitivity 
analysis. Suppose we already know a point of strongly regular intersection of 
a closed set and a subspace, but now want to find a point in the intersection 
of two slight perturbations of these sets. The following result shows that, 
starting from the original point of intersection, the method of alternating 
projections will converge linearly to the new intersection. 

Theorem 5.19 (perturbed intersection) With the assumptions of The- 
orem \5.Tb^ f for any small vector d G E ; the method of alternating projections 
applied to the sets d + F and C , with the initial point x G C , will converge 
with R-linear rate y/c to a point x G (d+F) PlC satisfying \\x — x\\ < jzr\\d\\ . 

Proof As in the proof of Theorem 15.151 if we fix any constant d G (c, c) 
and define 5 = then there exists a constant e > such that conditions 
(15.21) and (15. 3p hold. Suppose the vector d satisfies 



d\\ < 



(l-c)e 



€ 



s 



Since 



y£(C-d)n(x + -B) and v G N C - d (y) 

y + d E C (1 (x + eB) and v G N c (y + d) 
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we deduce from condition (15.21) the implication 



x e F n (x + |.B), ue-N F (x)nB\ , . , 
ye ( C -d)n(x + jB), v e N C -d{y) n J ^ ^)- c - 

Furthermore, using condition ( 15.31) we deduce the implication 

y, z G (C - d) n (x + ^B) and v G N C -d(y) H 5 

=>> t/ + d,z + dG(7n(i + e5) and t> G Afctj/ + d)n B, 
(v,z-y) < S\\z-y\\. 

We can now apply Theorem 15.11 with the set C replaced by C — d and the 
constant e replaced by |. We deduce that the method of alternating projec- 
tions on the sets F and C — d, starting at the point x = x — d <E C — d, 
converges with R-linear rate y/c to a point x G F fl (C — d) satisfying the 
inequality ||x — Xo|| < y^II^o — ^11- The theorem statement then follows by 
translation. □ 



Lack of convexity notwithstanding, more structure sometimes implies that 
the method of alternating projections converges Q-linearly, rather than just 
R-linearly, on a neighborhood of point of strongly regular intersection of two 
closed sets. One example is the case of two manifolds [25] . 



6 Inexact alternating projections 

Our basic tool, the method of alternating projections for a super-regular set 
C and an arbitrary closed set F, is a conceptual algorithm that may be chal- 
lenging to realize in practice. We might reasonably consider the case of exact 
projections on the super- regular set C: for example, in the next section, for 
the method of averaged projections, C is a subspace and computing projec- 
tions is trivial. However, projecting onto the set F may be much harder, so 
a more realistic analysis allows relaxed projections. 

We sketch one approach. Given two iterates X2n-i G F and X2n G C, a 
necessary condition for the new iterate X2 n +i to be an exact projection on F, 
that is x 2n +i G Pir(x 2 n), is 

H^n+l — X2n\\ < |p2n — a; 2n-l|| and X2n — ^2n+l G Np(x2 n +l)- 
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In the following result we assume only that we choose the iterate x 2n +i to 
satisfy a relaxed version of this condition, where we replace the second part 
by the assumption that the distance 

, / %2n ~ %2n+l \ 

dN F (x 2n+1 ) [ 7T- 1| ) 

v \\X2n — X2n+l\\ ' 

from the normal cone at the iterate to the normalized direction of the last 
step is small. 

Theorem 6.1 (inexact alternating projections) With the assumptions 
of Theorem \5.15[ fix any constant e < \J 1 — c 2 , and consider the following 
inexact alternating projection iteration. Given any initial points Xq G C and 
Xi G F , for n = 1, 2, 3, . . . suppose 

%2n e Pc(x 2 n-l) 

and x 2n +i G F satisfies 

II II II it J J ( X2n ~ X 2n+1 \ ^ 

|p2n+l — X2n\\ S |p2n — ^2n-l|| 0,na cll N F (x2 n+ i) I 71 if J S e - 

^ |p2n — #2ra+l|| ' 

TTien, providing xq and x\ are close to x, the iterates converge to a point in 
F flC with R-linear rate 

yj cy/l - e 2 + eVl - c 2 < 1. 

Sketch proof. Once again as in the proof of Theorem 15.151 we fix any 

constant d G (c, c) and define 5 = so there exists a constant e > such 
that conditions (15.21) and (15. 3p hold. Define a vector 

^ _ %2n ~ %2n+l 
\\x2n — X2n+l\\ 

By assumption, there exists a vector w G N F (x 2n +i) satisfying \\w 
Some elementary manipulation then shows that the unit vector w - 
satisfies 

(w, z) > Vl — e 2 . 



z\\ < e. 
WwW^w 
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As in the proof of Theorem 15.11 assuming inductively that X2 n +i is close to 
both x and x 2n , since w G N F (x 2n+ i), and 

|p2n+2 — ^2n+l|| 

we deduce 

(it), w) < c'. 

We now see that, on the unit sphere, the arc distance between the unit 
vectors w and z is no more than arccos(\/l — e 2 ), whereas the arc distance 
between w and the unit vector u is at least arccosc'. Hence by the triangle 
inequality, the arc distance between z and u is at least 



arccosc' — arccos(Vl — e 2 ), 

so 

(z,u) < cos ^arccosc' — arccos(Vl — e 2 ) J = c'x/I — e 2 + e\/l — c' 2 . 

Some elementary calculus shows that the quantity on the right-hand side is 
strictly less than one. Again as in the proof of Theorem I5.1[ this inequality 
shows, providing x is close to x, the inequality 



X2n+2 ~ X 2n +l\\ < (cVl - C 2 + eVi - C 2 J \\X 2n +l - %2n\\, 

and in conjunction with the inequality 

||^2n+l — x 2n 1 1 ^ \\%2n ~~ %2n-l \\ i 

this suffices to complete the proof by induction. □ 



7 Local convergence for averaged projections 

We now return to the problem of finding a point in the intersection of several 
closed sets using the method of averaged projections. The results of the 
previous section are applied to the method of averaged projections via the 
well-known reformulation of the algorithm as alternating projections on a 
product space. This leads to the main result of this section, Theorem 17. 3^ 



23 



which shows linear convergence in a neighborhood of any point of strongly 
regular intersection, at a rate governed by the associated regularity modulus. 

We begin with a characterization of strongly regular intersection, relating 
the condition modulus with a generalized notion of angle for several sets. 
Such notions, for collections of convex sets, have also been studied recently 
in the context of projection algorithms in [12,13]. 

Proposition 7.1 (variational characterization of strong regularity) 

Closed sets Fi, F 2 , . . . , F m C E have strongly regular intersection at a point 
x G DjFj if and only if the optimal value c of the optimization problem 

maximize (uj, Vj) 

i 

subject to || M i|| 2 — 1 

i 
i 
i 

Ui G E, vt G N F .(x) (i = 1,2,..., m) 

is strictly less than one. Indeed, we have 

(x e Djint Fi) 

(7-2) c 2 = { , I 



m ■ cond (F 1 ,F 2 , . . .,F m \x) 



(otherwise). 



Proof When x G PljintFj, the result follows by definition. Henceforth, we 
therefore rule out that case. 

For any vectors Ui,Vi G E (i = 1,2, ... , m), by Lagrangian duality and 
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differentiation we obtain 

max { : J^IKH 2 < 1, 5^ Mi = } 

i i i 

X 



mm max 

AeR+, zeE 



i ax { E ( Ui > ^ + 2 ( x ~ E ll Mj H 2 ) + ^' ^ n ^ } 

r, eE { ^ + £ m ^ x { <«*. ^ + *> - £ ik ii 2 } } 



mm 

AeR 



111111 ^ + ^ A "Ell^ + z H 2 



a>o, 2ge 12 2A 



'£ IK + 



mm / > + 2|' 2 

zGE 



\ 



=i j y i i 

Consequently, c 2 is the optimal value of the optimization problem 



maximize ^^H^ll 2 1 1 ^""^ v i 

i 

subject to \\ v i\\ 2 < 1 



UiGiV> 4 (5) (i = l,2, 

By homogeneity, the optimal solution must occur when the inequality con- 
straint is active, so we obtain an equivalent problem by replacing that con- 
straint by the corresponding equation. By 13.31 an the definition of the condi- 
tion modulus it follows that the optimal value of this new problem is 

1 

1 



m ■ cond (F 1: F 2 , . . . , F m \x) 
as required. □ 



Theorem 7.3 (linear convergence of averaged projections) Suppose 
closed sets F±, F2, • • • , F m C E have strongly regular intersection at a point 
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x G flj-Fj. Define a constant c G [0, 1) by equation ( 7.2), and fix any constant 



c G (c, 1) . Then for any initial point x G E close to x, the method of averaged 
projections converges to a point in the intersection (liFi, with R-linear rate 
c (and if each set Fi is super-regular at x, or in particular, prox-regular or 
amenable there, then the convergence rate is c 2 ). Furthermore, for any small 
perturbations di G E for i = 1,2, ... ,m, the method of averaged projections 
applied to the sets di + Fi, with the initial point x, converges linearly to a 
nearby point in the intersection, with R-linear rate c. 

Proof In the product space E m with the inner product 

<(«! 

, U2, ■ ■ ■ , U m ),(vi,v 2 ,...,v m )) = ^2(Ui,Vi} } 

i 

we consider the closed set 

i 

and the subspace 

L = {Ax : x G E}, 

where the linear map A: E — > E m is defined by Ax = ( OC j tic j • • • ■ oc ) . Notice 
Ax G F fl L, and it is easy to check 

N F (Ax) = l[N Fi (x) 

i 

and 



L 1 = {(ui,u 2 , . . .,u m ) : y^iij = oj. 



Hence Fi , F 2 , . . . , F m have strongly regular intersection at x if and only if F 
and L have strongly regular intersection at the point Ax. This latter property 
is equivalent to the constant c defined in Theorem 15.151 (with C = L) being 
strictly less than one. But that constant agrees exactly with that defined 
by equation (17.21) . so we show next that we can apply Theorem 15.151 and 
Theorem 15.191 

To see this note that, for any point x G E, we have the equivalence 
(zi,z 2 , ... ,z m ) G P F (Ax) Zi G PfA x ) (i = l,2,...,m). 

Furthermore a quick calculation shows, for any Z\, z 2 , . . . , z m G E, 

P L (zi,Z 2 ,...,Z m ) = —(zi+Z 2 ^ VZm)- 

m 
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Hence in fact the method of averaged projections for the sets F\, F 2 , . . . , F m , 
starting at an initial point Xq, is essentially identical with the method of 
alternating projections for the sets F and L, starting at the initial point Axq. 
If Xo, X\, X2, ■ ■ ■ is a possible sequence of iterates for the former method, then a 
possible sequence of even iterates for the latter method is Axq, Ax\, Ax 2 , .... 
For xq close to x, this latter sequence must converge to a point Ax G F fl L 
with R-linear rate c, by Theorem 15.151 and its corollary. Thus the sequence 
xo,xi,X2, ■ ■ ■ converges to x G fljFj at the same linear rate. When each of the 
sets Fi is super-regular at x, it is easy to check that the Cartesian product F 
is super-regular at Ax, so the rate is c 2 . The last part of the theorem follows 
from Theorem 15.191 □ 

Applying Theorem 16. II to the product-space formulation of averaged projec- 
tions shows in a similar fashion that an inexact variant of the method of 
averaged projections will also converge linearly. 

Remark 7.4 (strong regularity and local extremality) We notice, in 
the language of [26] , that we have proved algorithmically that if closed sets 
have strongly regular intersection at a point, then that point is not "locally 
extremal" . 

Remark 7.5 (alternating versus averaged projections) Consider a 
feasibility problem involving two super- regular sets F\ and F 2 . Assume that 
strong regularity holds at x G F\ fl F 2 and set k = cond(Fi, F 2 \x). Theorem 
17.31 gives a bound on the rate of convergence of the method of averaged 
projections as 

1 

r av < 1 - tt^ • 

Notice that each iteration involves two projections: one onto each of the sets 
F\ and F 2 . On the other hand, Corollary 15.171 and (I5.18P give a bound on 
the rate of convergence of the method of alternating projections as 

1 

r^t < 1 - — , 

and each iteration involves just one projection. Thus we note that our bound 
on the rate of alternating projections r a j t is always better than the bound 
on the rate of averaged projection r av . At least from the perspective of 
this analysis, averaged projections seems to have no advantage over alter- 
nating projections, although our proof of linear convergence for alternating 
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projections needs a super-regularity assumption not necessary in the case of 
averaged projections. 



8 Prox-regularity and averaged projections 

If we assume that the sets F 1 ,F 2 , . . . , F m are prox-regular, then we can refine 
our understanding of local convergence for the method of averaged projec- 
tions using a completely different approach, explored in this section. 

Proposition 8.1 Around any point x at which the set F C E is prox-regular, 
the squared distance to F is continuously differentiable, and its gradient 
Vdp = 2(1 — P F ) has Lipschitz constant 2. 

Proof This result corresponds essentially to [31, Prop 3.1], which yields 
the smoothness of dp together with the gradient formula. This proof of this 
proposition also shows that for any small 5 > 0, all points x±,x 2 G E near x 
satisfy the inequality 



||(7 - P F )( Xl ) -(I- Pf){x 2 )\\ 2 - \\x x - x 2 f 

= \\(xi - x 2 ) - (Pf(xi) ~ Pf{x 2 )) || 2 - ||xi - x 2 \\ 2 

= -2( Xl - x 2 , P F ( Xl ) - P F (x 2 )) + WPrix,) - P F (x 2 )\\ 2 

< (25 - l)\\P F ( Xl ) - P F (x 2 )\\ 2 

<0, 



As before, consider sets Fi,F 2 , . . . ,F m C E and a point x G fliFj, but 
now let us suppose moreover that each set Fj is prox-regular at x. Define a 
function / : E — > R by 



(x 1 -x 2 ,P F (x 1 )-P F (a; 2 )) > (l-S)\\P F ( Xl ) 



Pf(x 2 )\\ 2 



(see "Claim" in [31, p. 5239]). Consequently we have 



provided we choose 5 < 1/2. 



□ 



(8.2) 




i=i 
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This function is half the mean-squared- distance from the point x to the set 
system {Fi}. According to the preceding result, / is continuously differen- 
tiable around x, and its gradient 

(«- 3 > v/=-£ ( >-^) = >--E^ 

i=l i=l 

is Lipschitz continuous with constant 1 on a neighborhood of x. The method 
of averaged projections constructs the new iterate x + G E from the old iterate 
iGE via the update 

(8.4) x + = -V;P lii (x)=x-V/(x), 



m 



i=i 



so we can interpret it as the method of steepest descent with a step size of 
one when the sets Fj are all prox-regular. To understand its convergence, we 
return to our strong regularity assumption. 

The condition modulus controls the behavior of normal vectors not just 
at the point x but also at nearby points. 

Proposition 8.5 (local effect of condition modulus) Consider closed 
sets F 1 , F 2 , . . . , F m C E having strongly regular intersection at a point x G 
HFi, and any constant 

k > cond(Fi,F 2 , ...,F m \x). 

Then for any points X; L G F i near x, any vectors yi e N F .(xi) (for i = 
1,2, ... ,m) satisfy the inequality 



Jew < HI £4 

y i i 



Proof If the result fails, then we can find sequences of points x\ — > x in 
Fi and sequences of vectors y\ G N Fi (xi) (for % — 1, 2, . . . , m) satisfying the 
inequality 

'£M 2 >*||Etf 
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for all r = 1,2, 



Define new vectors 



for each index j = 1,2, ... , m and r. Notice 

Eikii 2 = 1 and < ^- 

j i 

For each i = 1,2,..., the sequence u},uj, ... is bounded, so after taking 
subsequences we can suppose it converges to some vector 6 E, and since 
the normal cone Np. is closed as a set- valued mapping from F 4 to E, we 
deduce u j G A^p. (x) . But then we have 

^||wi|| 2 = l and || y^i|| < j, 

i i 

contradicting the definition of the condition modulus cond(Fi, F 2 , . . . , F m \x). 
The result follows. □ 



The size of the gradient of the mean-squared-distance function /, defined 
by equation (18.21) . is closely related to the value of the function near a point 
of strongly regular intersection. To be precise, we have the following result. 

Proposition 8.6 (gradient of mean-squared-distance) Consider prox- 
regular sets Fx, . . . , F m C E having strongly regular intersection at a point 
x G PlFj, and any constant 

k > cond(iq,F 2 , ...,F m \x). 

Then on a neighborhood of x, the mean-squared-distance function 

1 m 

f = —Vd 2 F 

i=l 

satisfies the inequalities 

(8.7) ^l|V/|| 2 </<^||V/|| 2 . 
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Proof Consider any point x G E near x. By equation (18.31) . we know 

V/(*) = -5>, 
i=i 

where 

Ui = Xi- PF t { x i) e N Fi {P Fi {xi)) 
for each z = 1, 2, . . . , m. By definition, we have 

Using inequality (12. 4p . we obtain 

™ 2 ||V/(*)|| 2 = || < || yi || 2 = 2m 2 /(x) 

i=l i=l 

On the other hand, since a? is near x, so are the projections P Fi {x), so 
2m/(x) = llrf < fc2 || EHf = ^IIWWII 2 . 

i i 

by Proposition 18.51 The result now follows. □ 
A standard argument now gives the main result of this section. 

Theorem 8.8 (Q-linear convergence for averaged projections) 

Consider prox-regular sets Fi,F 2 ,...,F m C E having strongly regular in- 
tersection at a point x G nF i} and any constant k > cond(Fi, F 2 , . . . , F m \x). 
Then, starting from any point near x, one iteration of the method of averaged 
projections reduces the mean- squared- distance 

f ~ 2^ d Fi 
i=l 

by a factor of at least 1 — -A— . 
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Proof Consider any point x G E near x. The function / is continuously 
differentiable around the minimizer x, so the gradient Vf(x) must be small, 
and hence the new iterate x + = x — V/(x) must also be near x. Hence, as 
we observed after equation (|8.3p . the gradient V/ has Lipschitz constant one 
on a neighborhood of the line segment [x, x + ]. Consequently, 

/(*+) - fix) 
- 1 d 



J j t f(x-tVf(x))dt 

[ (-Vf(x),Vf(x-tVf(x)))dt 
Jo 



\\Vf{xW + (V/(x), V/(s) - V/(z - tV/(x))> dt 



<-||V/(z)|| 2 + / ||V/(x)||.||V/(x)-V/(x-tV/(x))||A 



o 

<-||V/(.r)|| a + /'llV/^H 2 ^/ 



o 



2 



= -^l|V/(x)|| 

using Proposition 18.61 □ 

A simple induction argument now gives an independent proof in the prox- 
regular case that the method of averaged projections converges linearly to a 
point in the intersection of the given sets. Specifically, the result above shows 
that mean-squared-distance f(xk) decreases by at least a constant factor at 
each iteration, and Proposition 18.61 shows that the size of the step ||V/(xfc)|| 
also decreases by a constant factor. Hence the sequence (xk) must converge 
R-linearly to a point in the intersection. 

Comparing this result to Theorem 17.31 (linear convergence of averaged 
projections), we see that the predicted rates of linear convergence are the 
same. Theorem 17.31 guarantees that the squared distance to the intersection 
converges to zero with R-linear rate c 2 (for any constant c G (c, 1)). The 
argument gives no guarantee about improvements in a particular iteration: 
it only describes the asymptotic behavior of the iterates. By contrast, the 
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argument of Theorem 18.81 with the added assumption of prox-regularity, 
guarantees the same behavior but with the stronger information that the 
mean-squared-distance decreases monotonically to zero with Q-linear rate 
c 2 . In particular, each iteration must decrease the mean-squared-distance. 

9 A Numerical Example 

In this final section, we give a numerical illustration showing the linear con- 
vergence of alternating and averaged projections algorithms. Some major 
problems in signal or image processing come down to reconstructing an ob- 
ject from as few linear measurements as possible. Several recovery procedures 
from randomly sampled signals have been proved to be effective when com- 
bined with sparsity constraints (see for instance the recent developments of 
compressed sensing [16], [14]). These optimization problems can be cast as 
linear programs. However for extremely large and/or nonlinear problems, 
projection methods become attractive alternatives. In the spirit of com- 
pressive sampling we use projection algorithms to optimize the compression 
matrix. This speculative example is meant simply to illustrate the theory 
rather than make any claim on real applications. 

We consider the decomposition of images x £ R n as x = Wz where 
W £ R nxm (n < m) is a "dictionary" (that is, a redundant collection of 
basis vectors). Compressed sensing consists in linearly reducing x to y — 
Px = PW z with the help of a compression matrix P £ R dxn (with d <^ n); 
the inverse operation is to recover x (or z) from y. Compressed sensing theory 
gives sparsity conditions on z to ensure exact recovery [16], [14]. Reference 
[16] in fact proposes a recovery algorithm based on alternating projections 
(on two convex sets). In general, we might want to design a specific sensing 
matrix P adapted to W, to ease this recovery process. An initial investigation 
on this question is [17]; we suggest here another direction, inspired by [6] 
and [18], where averaged projections naturally appear. 

Candes and Romberg [6] showed that, under orthogonality conditions, 
sparse recovery is more efficient when the entries KPW 7 )^! are small. One 
could thus use the componentwise norm of PW as a measure of quality 
of P. This leads to the following feasibility problem: to find U = PW such 
that UU T = I and with the infinity norm constraint ||t/||oo < ot (for a fixed 
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tolerance a). The sets corresponding to these constraints are given by 



L 



{U e R 
{U e R 
{U e R' 



dxm 



U = PW}, 
UU T = I}, 
||C/|U<a}. 



M 



dxm 



c 



dxm 



The first set L is a subspace, the second set M is a smooth manifold while the 
third C is convex; hence the three are prox-regular. Moreover we can easily 
compute the projections. The projection onto the linear subspace L can be 
computed with a pseudo-inverse. The manifold M corresponds to the set of 
matrices U whose singular values are all ones; it turns out that naturally the 
projection onto M is obtained by computing the singular value decomposition 
of U, and setting singular values to 1 (apply for example Theorem 27 of [25]). 
Finally the projection onto C comes by shrinking entries of U (specifically, we 
operate min{max{w^-, —a}, a} for each entry Uy). This feasibility problem 
can thus be treated by projection algorithms, and hopefully a matrix U G 
LflMnC will correspond to a good compression matrix P. 

To illustrate this problem, we generate random entries (normally dis- 
tributed) of the dictionary W (size 128 x 512, redundancy factor 4) and of 
an initial iterate Uq G L. We fix a = 0.1, and we run the averaged projection 
algorithm which computes a sequence of Figure M shows 



for all iterations k, showing the expected Q-linear convergence. We note that 
working on random test cases is of interest for our simple testing of averaged 
projections: though we cannot guarantee in fact that the intersection of 
the three sets is strongly regular, randomness seems to prevent irregular 
solutions, providing a is not too small. So in this situation, it is likely 
that the algorithm will converge linearly; this is indeed what we observe in 
Figure [9j We note furthermore that we tested alternating projections on this 
problem (involving three sets, so not explicitly covered by Theorem I5.15p . 
We observed that the method is still converging linearly in practice, and 
again, the rate is better than for averaged projections. 



log 10 f(U k ) with f{U) = \{dl{U) + d 2 M (U) + d 2 c (U)) 



for each iteration k. We also observe that the ratio 



f(U k+1 )/f(U k+1 ) < 0.9627 
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iteration 



Figure 1: Convergence of averaged projection algorithm for designing com- 
pression matrix in compressed sensing. 

This example illustrates how the projection algorithm behaves on random 
feasibility problems of this type. However the potential benefits of using 
optimized compression matrix versus random compression matrix in practice 
are still unclear. Further study and more complete testing have to be done 
for these questions; this is beyond the scope of this paper. 
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